62 research outputs found
Cross-lingual Distillation for Text Classification
Cross-lingual text classification(CLTC) is the task of classifying documents
written in different languages into the same taxonomy of categories. This paper
presents a novel approach to CLTC that builds on model distillation, which
adapts and extends a framework originally proposed for model compression. Using
soft probabilistic predictions for the documents in a label-rich language as
the (induced) supervisory labels in a parallel corpus of documents, we train
classifiers successfully for new languages in which labeled training data are
not available. An adversarial feature adaptation technique is also applied
during the model training to reduce distribution mismatch. We conducted
experiments on two benchmark CLTC datasets, treating English as the source
language and German, French, Japan and Chinese as the unlabeled target
languages. The proposed approach had the advantageous or comparable performance
of the other state-of-art methods.Comment: Accepted at ACL 2017; Code available at
https://github.com/xrc10/cross-distil
An equivalent-effect phenomenon in eddy current non-destructive testing of thin structures
The inductance/impedance due to thin metallic structures in non-destructive
testing (NDT) is difficult to evaluate. In particular, in Finite Element Method
(FEM) eddy current simulation, an extremely fine mesh is required to accurately
simulate skin effects especially at high frequencies, and this could cause an
extremely large total mesh for the whole problem, i.e. including, for example,
other surrounding structures and excitation sources like coils. Consequently,
intensive computation requirements are needed. In this paper, an
equivalent-effect phenomenon is found, which has revealed that alternative
structures can produce the same effect on the sensor response, i.e. mutual
impedance/inductance of coupled coils if a relationship (reciprocal
relationship) between the electrical conductivity and the thickness of the
structure is observed. By using this relationship, the mutual
inductance/impedance can be calculated from the equivalent structures with much
fewer mesh elements, which can significantly save the computation time. In eddy
current NDT, coils inductance/impedance is normally used as a critical
parameter for various industrial applications, such as flaw detection, coating
and microstructure sensing. Theoretical derivation, measurements and
simulations have been presented to verify the feasibility of the proposed
phenomenon
GPTEval: NLG Evaluation using GPT-4 with Better Human Alignment
The quality of texts generated by natural language generation (NLG) systems
is hard to measure automatically. Conventional reference-based metrics, such as
BLEU and ROUGE, have been shown to have relatively low correlation with human
judgments, especially for tasks that require creativity and diversity. Recent
studies suggest using large language models (LLMs) as reference-free metrics
for NLG evaluation, which have the benefit of being applicable to new tasks
that lack human references. However, these LLM-based evaluators still have
lower human correspondence than medium-size neural evaluators. In this work, we
present GPTEval, a framework of using large language models with
chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of
NLG outputs. We experiment with two generation tasks, text summarization and
dialogue generation. We show that GPTEval with GPT-4 as the backbone model
achieves a Spearman correlation of 0.514 with human on summarization task,
outperforming all previous methods by a large margin. We also propose
preliminary analysis on the behavior of LLM-based evaluators, and highlight the
potential issue of LLM-based evaluators having a bias towards the LLM-generated
texts
Language Models can be Logical Solvers
Logical reasoning is a fundamental aspect of human intelligence and a key
component of tasks like problem-solving and decision-making. Recent
advancements have enabled Large Language Models (LLMs) to potentially exhibit
reasoning capabilities, but complex logical reasoning remains a challenge. The
state-of-the-art, solver-augmented language models, use LLMs to parse natural
language logical questions into symbolic representations first and then adopt
external logical solvers to take in the symbolic representations and output the
answers. Despite their impressive performance, any parsing errors will
inevitably result in the failure of the execution of the external logical
solver and no answer to the logical questions. In this paper, we introduce
LoGiPT, a novel language model that directly emulates the reasoning processes
of logical solvers and bypasses the parsing errors by learning to strict
adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly
constructed instruction-tuning dataset derived from revealing and refining the
invisible reasoning process of deductive solvers. Experimental results on two
public deductive reasoning datasets demonstrate that LoGiPT outperforms
state-of-the-art solver-augmented LMs and few-shot prompting methods on
competitive LLMs like ChatGPT or GPT-4.Comment: Preprin
UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization
The high annotation costs and diverse demands of various summarization tasks
motivate the development of few-shot summarization. However, despite the
emergence of many summarization tasks and datasets, the current training
paradigm for few-shot summarization systems ignores potentially shareable
knowledge in heterogeneous datasets. To this end, we propose \textsc{UniSumm},
a unified few-shot summarization model pre-trained with multiple summarization
tasks and can be prefix-tuned to excel at any few-shot summarization task.
Meanwhile, to better evaluate few-shot summarizers, under the principles of
diversity and robustness, we assemble and release a new benchmark
\textsc{SummZoo}. It consists of summarization tasks with multiple sets of
few-shot samples for each task, covering diverse domains. Experimental results
and analysis show that \textsc{UniSumm} outperforms strong baselines by a large
margin across all sub-tasks in \textsc{SummZoo} under both automatic and human
evaluations and achieves comparable results in human evaluation compared with a
GPT-3.5 model.Comment: ACL2023 main conferenc
- …